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On the basis of first-principles GW calculations, we study the quasiparticle properties of the gua- 
nine, adenine, cytosine, thymine, and uracil DNA and RNA nucleobases. Beyond standard GoWo 
calculations, starting from Kohn-Sham eigenstates obtained with (semi) local functionals, a simple 
self-consistency on the eigenvalues allows to obtain vertical ionization energies and electron affinities 
within an average 0.11 eV and 0.18 eV error respectively as compared to state-of-the-art coupled- 
cluster and multi-configurational perturbative quantum chemistry approaches. Further, GW calcu- 
lations predict the correct 7r-character of the highest occupied state, thanks to several level crossings 
between density functional and GW calculations. Our study is based on a recent gaussian-basis im- 
plementation of GW with explicit treatment of dynamical screening through contour deformation 
techniques. 
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The determination of the ionization energies (IE), elec- 
tronic affinities (EA) and character of the frontier or- 
bitals of DNA and RNA nucleobases is an important 
step towards a better understanding of the electronic 
properties and reactivity of nucleotides and nucleosides 
along the DNA/RNA chains. Important phenomena 
such as nucleobases/protein interactions, defining the 
DNA functions 1 , or damages of the genetic material 
through oxidation or ionizing radiations 2 , are strongly 
related to these fundamental spectroscopic quantities. 
Even though nucleobases in DNA/RNA strands are 
connected within the nucleotides to phosphate groups 
through a five-carbon sugar, several studies show that 
the highest-occupied orbital (the HOMO level) in nu- 
cleotides, which is responsible e.g. for the sensitivity of 
the molecule to oxidation processes, remains localized on 
the nucleobases^. Figure Q] shows the structures of the 
DNA and RNA nucleobases, i.e. the purines - adenine 
(A) and guanine (G), and the pyrimidines - cytosine (C) 
as well as thymine (T) in DNA and uracil (U) in RNA. 

Besides the overarching fundamental interest in un- 
derstanding complex biological processes at the micro- 
scopic level, ab initio calculations of isolated nucleobases 
are interesting since recent high-level quantum chem- 
istry calculations^ — allow to rationalize the rather large 
spread of experimental results concerning the electronic 
properties of the nucleobases in the gas phased"—, in par- 
ticular as due to the existence of several isomers for gua- 
nine and cytosine 6 . Thus, these molecules offer a valu- 
able mean to explore the merits of the so-called GW 
formalism^— for isolated organic molecules, along the 
line of recent systematic studies of small molecules^ or 
molecules such as fullerenes or porphyrins of interest for 
electronic or photovoltaic applications 2 ^—. 

In the present work, we study by means of first- 
principles GW calculations the quasiparticle properties 
of the DNA and RNA nucleobases, namely guanine, ade- 
nine, cytosine, thymine and uracil. We show in particular 
that the GW correction to the Kohn-Sham eigenvalues 



brings the ionization energies in much better agreement 
with experiment and high-level quantum chemistry calcu- 
lations. These results demonstrate the importance of self- 
consistency on the eigenvalues when performing GW cal- 
culations in molecular systems starting from (semi) local 
DFT functionals, and the merits of a simple scheme based 
on a Go Wo calculation starting from Hartree-Fock like 
eigenvalues. 

The GW approach is a Green's function formalism usu- 
ally derived within a functional derivative treatmen t 14 ^ 6 
allowing to prove that the two-body Green's function 
(G2), involved in the equation of motion of the one- 
body time-ordered Green's function G, can be recast 
into a non-local and energy-dependent self-energy opera- 
tor E(r, r f \E). This self-energy E accounts for exchange 
and correlation in the present formalism. Since it is 
energy-dependent, it must be evaluated at the E = ef P 
quasiparticle energies, where (i) indexes the molecular 
energy levels. This self-energy involves G(r, r f \u)), the 
dynamically-screened Coulomb potential W(r, r'|u;), and 
the so-called vertex correction T. A set of exact self- 
consistent (closed) equations connects G, W, T, and 
the independent-electron/full polarisabilities xo( r ^'\^) 
and x( r ? r/ |^)? respectively. In the GW approximation 
(GWA), the three-body vertex operator T is set to unity, 
yielding the following expression for the self-energy: 
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where v(r, r') is the bare (unscreened) Coulomb potential 
and W = W—v. The <j>i) are "zeroth-order" one-body 
eigenstates. Following the large bulk of work 18 devoted to 
GW calculations in solids, surfaces, graphene, nanotubes, 
or nanowires, we use here Kohn-Sham DFT-LDA eigen- 
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FIG. 1: (Color online) Schematic representation of the molecular structure of (a) guanine (G9K), (b) adenine, (c) cytosine 
(CI), (d) thymine, and (e) uracil. Black, brown, red, white atoms are carbon, nitrogen, oxygen, and hydrogen, respectively. 
The G9K and CI notations for the guanine and cytosine tautomers are consistent with Ref. U 



states. It is shown below, and in Refs. [lljlljl^IIl, that 
Hartree-Fock (or hybrid) solutions may constitute better 
starting points for molecular systems. (fi,fj) are Fermi- 
Dirac occupation numbers, and 5 an infinitesimal such 
that the poles of W fall in the second and fourth quad- 
rants of the complex plane. In the GW approximation, 
the self-energy operator can be loosely interpreted as a 
generalization of the Hartree-Fock method by replacing 
the bare Coulomb potential with a dynamically screened 
Coulomb interaction accounting both for exchange and 
(dynamical) correlations. An important feature of the 
GW approach is that not only ionization energies and 
electronic affinities can be calculated, but also the full 
quasiparticle spectrum. Further, both localized and in- 
finite systems can be treated on the same footing with 
long and short range screening automatically accounted 
for in the construction of the screened Coulomb potential 
W. More details about the present implementation can 
be found in Ref. HH. 

Our calculations are based on a recently developed im- 
plementation of the GW formalism (the Fiesta code) 
using a gaussian auxiliary basis to expand the two-point 
operators such as the Coulomb potential, the suscepti- 
bilities or the self-energy^. Dynamical correlations are 
included explicitly through contour deformation tech- 
niques. We start with a ground- state DFT calculation 
using the Siesta package 2 ^ and a large triple-zeta with 
double polarization (TZDP) basis^. We fit the radial 
part of the numerical basis generated by the Siesta code 
by up to five contracted gaussians in order to facilitate 
the calculation of the Coulomb matrix elements and of 
the matrix elements {(t>%\P\(j>j) of the auxiliary basis (/?) 
between Kohn-Sham states. Such a scheme allows to ex- 
ploit the analytic relations for the products of gaussian 
orbitals centered on different atoms or for their Fourier 
transform 25 . Our auxiliary basis for first row elements 
is the tempered basis 31 developed by Kaczmarski and 
coworkers 32 . Such a basis was tested recently in a system- 
atic study of several molecules of interest for photovoltaic 
applications^ 5 -. Four gaussians for each 1-channel with lo- 
calization coefficients a=(0.2,0.5,1.25,3.2) a.u. are used 
for the (s,p,d) channels of C, O, and N atoms, while three 



gaussians with a= (0.1, 0.4, 1.5) a.u. describe hydrogen 33 . 

Ionization energies. We now comment on the val- 
ues of the calculated first ionization energy (IE) as com- 
piled in the Table and Fig. [2j The comparison to the 
experimental data is complicated by the 0.2-0.3 eV range 
spanned by the various experimental reports (vertical ar- 
rows Fig. [2j). An additional complication in the case of 
cytosine and guanine, beyond the intrinsic difficulties in 
accurately measuring ionization energies in the gas phase, 
is that several gas phase tautomers exist£ which differ 
from the so-called Cl-cytosine and G9K-guanine isomers 
commonly found in DNA (see Fig. [I]). State-of-the-art ah 
initio quantum chemistry calculations, namely coupled- 
cluster CCSD(T) and multiconfigurational perturbation 
(CASPT2) methods^ 5 -, studied the nucleobase tautomers 
that can be found along the DNA/RNA strands. More 
recently, equation of motion coupled-cluster techniques 
(EOM-IP-CCSD) were performed on several isomers^. 
All methods agree to within 0.04 eV for the average IE of 
the A, G, C, T tautomers we consider here, with a maxi- 
mum discrepancy of 0.09 eV in the case of thymine. The 
CASPT2 and CCSD(T) calculations agree to within 0.03 
eV for all molecules. These theoretical IE are commonly 
considered as the most reliable references and land within 
the experimental error bars, except for the cytosine (CI) 
case where the calculated IEs are slightly smaller than 
the experimental lower bound 34 (see Table and Fig. [2]). 

Clearly, the ionization energy within DFT-LDA, as 
given by the negative HOMO Kohn-Sham level energy, 
significantly underestimates the IE by an average of 
—2.5 eV (29%)2&. The self-energy correction at the 
GoWo(LDA) level improves very significantly the situa- 
tion by bringing the error to an average 0.5 eV (5.7%) as 
compared to state-of-the-art quantum chemistry results. 
However, as emphasized in recent paper o 19 i 25 i 27 i 28 , the 
over screening induced by starting with LDA eigenvalues, 
which dramatically underestimate the band gap, tends 
to produce too small ionization energies. This problem 
can be solved at least partly by performing a simple self- 
consistency on the eigenvalues. We shall refer to this 
approach as GW henceforth. Such a self-consistency on 
the eigenvalues leads to a much reduced average error 
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FIG. 2: (Color online) Ionization energies in eV. The ver- 
tical (maroon) error bars indicate the experimental range. 
Triangles up (light blue): LDA values; (green) squares: 
Go Wo (LDA) values; full black diamond: GW values; (red) 
empty circles (QuantChem abbreviation): quantum chem- 
istry, namely CCSD(T), CASPT2 and EOM-IP-CCSD, values 
(see text). 



of 0.11 eV (~1.3%) as compared to the quantum chem- 
istry reference. This good agreement certainly indicates 
the reliability of the present GW scheme for such sys- 
tems. As shown in Fig. [2j the largest discrepancies are 
observed for guanine and adenine (the purines), while the 
agreement is excellent for the three remaining bases. 

In recent work, it was shown that for small molecules 
a non-self-consistent Go Wo calculation starting from 
Hartree-Fock eigenstates leads for the ionization energy 
to better results than a full self-consistent GW calcu- 
lation where the wavefunctions are updated as wel l 19 i 27 . 
Consistent with this observation, a simple scheme relying 
on an Hartree-Fock-like approach was successfully tested 
on silane, disilane, and water 28 , and larger molecules such 
as fullerenes or porphyrins 2 ^. In this "Go Wo on Hartree- 
Fock (HF)" ansatz, the input eigenvalues (e n ) are com- 
puted within a diagonal first-order perturbation theory 
where the DFT exchange-correlation contribution to the 
eigenvalues is replaced by the Fock exchange integral, 
namely: 



^+<^ DA l^-K L c DA i^ r 



> . 



where is the Fock operator. This approach, labeled 
GoWo(HFdiag) in the Table, produces an average error 
of 0.22 eV (~2.6%). This good agreement with both the 
GW and quantum chemistry calculations clearly speaks 
in favor of this simple scheme for molecular systems, or 
the full GoWo(HF) calculations tested in Ref. [H, which 
also avoids seeking self-consistency. A difficult issue lying 
ahead concerns e.g. hybrid systems, such as semiconduct- 
ing surfaces grafted by organic molecules, for which it is 



not quite clear what should be the best starting point. 

Next, we address the character of the HOMO level 
of cytosine and uracil. It changes from DFT-LDA 
to GW calculations. We plot in Fig. EJa-d) the Cl- 
cytosine DFT-LDA Kohn-Sham HOMO to (HOMO-3) 
eigenstates. The LDA HOMO level is an in plane g state 
with a strong component on the (p x ,Py) oxygen orbitals. 
Such a state is labeled go in the Table and in the follow- 
ing. The (HOMO-1) level is a more standard it- state with 
weight on the oxygen (p z ) orbital and a delocalized ben- 
zene ring 7r molecular orbital. Within the Go Wo (LDA), 
GW and G W (HF diag ) approaches, the LDA HOMO 
go state is pushed to a significantly lower energy and 
the 7r state becomes the HOMO level. This level cross- 
ing brings the GW calculations in agreement with many- 
body quantum chemistry calculations, which all predict 
the 7r state to be the HOMO level. The same level cross- 
ing is observed in the case of uracil with the LDA HOMO 
and (HOMO-1) levels being go and 7r-states respectively, 
while all GW results and quantum chemistry calculations 
predict a reverse ordering. Our interpretation is that the 
very localized go state suffers much more from the spu- 
rious LDA self interaction than the rather delocalized ir 
state. Even though it would be wrong to reduce the dy- 
namical GW self-energy operator to a self-interaction free 
functional, the GW correction certainly cures in part this 
well-known problem. The other bases, namely guanine, 
adenine, and thymine, all show the correct 7r-character 
for the HOMO level. 

The HOMO to (HOMO-1) energy difference averages 
to 0.80 eV and 1.12 eV within CASPT2 and EOM-IP- 
CCSD, respectively. Clearly, the average LDA energy 





FIG. 3: (Color online) Isodensity surface plot of the HOMO 
{(to), HOMO-1 (tt), HOMO-2 (a), and HOMO-3 (tt') LDA 
Kohn-Sham eigenstates of cytosine. Within GW, the ordering 
of states becomes 7t,7v\<jo, g for HOMO to HOMO-3 (see 
text). 
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Vertical ionization energies and vertical electronic affinities 
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TABLE I: Vertical ionization energies and electronic affinities in eV as obtained from the negative Kohn-Sham eigenvalues 
(LDA-KS), from non-self-consistent GoWo(LDA) calculations, from a GW calculation with self-consistency on the eigenvalues 
(GW), and from a non- self- consistent GoWo(HFdiag) calculation starting from Hartree-Fock-like eigenvalues. The a or ty 
character of the wavefunctions is indicated when the GW correction changes the level ordering as compared to DFT-LDA 
(see text). The acronyms CAS, CC and EOM stand for CASPT2, CCSD(T) and equation of motion coupled-cluster high-level 
many body quantum chemistry calculations, respectively. Theoretical values are reported for the Cl-cytosine and G9K-guanine, 
while the experimental values average over several tautomers. The MAE is the mean absolute error in eV as compared to the 
quantum chemistry reference calculations in columns 6 and 7. a Ref. @. 6 Ref. 0. c Ref. @. ^Compiled in Ref . [j. e Compiled in 
Ref.@. / Ref.[T3. *Ref.@. 



spacing of 0.22 eV is significantly too small. We find 
that the 0.77 eV GoWo(LDA) average value is close to 
the CASPT2 results, while the larger 1.29 eV GW re- 
sult falls closer to the EOM-IP-CCSD energy difference. 
Averaging over all isomers, the experimental HOMO 
to (HOMO-1) energy spacing comes to 0.97 eV, in be- 
tween the GoWo(LDA) or CASPT2 results and the GW 
or EOM-IP-CCSD values. Even though it is too early 
for final conclusions about the merits of the various ap- 
proaches, it seems fair to state that the LDA value is sig- 
nificantly too small, and that the situation is improved 
significantly by the GW correction. 

Electronic affinities. We conclude this study by ex- 
ploring the electronic affinity (EA) of the nucleobases. 
They are provided in the Table as the negative sign of the 
LUMO Kohn-Sham energies. Experimental data for gua- 
nine are missing. Further, the CASPT2 and CCSD(T) 
results 5 are clearly larger (in absolute value) than the 
highest experimental estimates. While again part of the 
discrepancy may come from the presence of several tau- 
tomers in the gas phase, it certainly results as well from 
the fact that the electronic affinity is negative. A detailed 
discussion on the experimental difficulties in probing un- 
bound states is presented in Ref. 0. Taking again the 
CCSD(T) and CASPT2 calculations 5 as a reference, the 



GW electronic affinities are quite satisfying, with a MAE 
of 0.18 eV. Such an agreement is rather impressive since 
the LDA electronic affinities show the wrong sign, with 
a discrepancy as compared to CASPT2 ranging from 2.9 
eV to 3.6 eV. We observe that while the Go Wo EAs are 
smaller (in absolute value) than the quantum chemistry 
ones, the GW EAs are larger. This contrasts with the 
IE case where both Go Wo and GW values were smaller 
(see Fig. 2). Similar to the quantum chemistry case, the 
GW values are found to systematically overestimate the 
experimental results. Further study is needed to under- 
stand such a discrepancy between theoretical and avail- 
able experimental results. 

In conclusion, we have studied on the basis of ab initio 
GW calculations the ionization energies and electronic 
affinities of the DNA and RNA nucleobases, guanine, 
adenine, cytosine, thymine and uracil. While a stan- 
dard Go Wo (LDA) calculation yields ionization energies 
that are 0.5 eV away from CCSD(T)/CASPT2 reference 
quantum chemistry calculations, self-consistency on the 
eigenvalues brings the agreement to an excellent 0.11 eV 
average absolute error. A simple Go Wo calculation start- 
ing from Hartree-Fock-like eigenvalues, avoiding the need 
for self-consistency, shifts the agreement to 0.22 eV. The 
possibility of bringing the calculated values to within 0.1- 



5 



0.2 eV from state-of-the-art reference calculations with a 
scheme, the GW formalism, which allows to treat both 
finite size and extended systems with a N 4 scaling, and 
permits to obtain the full quasiparticle spectrum, paves 
the way to further studies of larger DNA strands and 



biological systems in general. 
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